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Abstract 

Background: Microarray data analysis presents a significant challenge to researchers who are unable to use the 
powerful Bioconductor and its numerous tools due to their lack of knowledge of R language. Among the few 
existing software programs that offer a graphic user interface to Bioconductor packages, none have implemented a 
comprehensive strategy to address the accuracy and reliability issue of microarray data analysis due to the well 
known probe design problems associated with many widely used microarray chips. There is also a lack of tools that 
would expedite the functional analysis of microarray results. 

Findings: We present Microarray ft US, an R-based graphical user interface that implements over a dozen popular 
Bioconductor packages to offer researchers a streamlined workflow for routine differential microarray expression 
data analysis without the need to learn R language. In order to enable a more accurate analysis and interpretation 
of microarray data, we incorporated the latest custom probe re-definition and re-annotation for Affymetrix and 
lllumina chips. A versatile microarray results output utility tool was also implemented for easy and fast generation of 
input files for over 20 of the most widely used functional analysis software programs. 

Conclusion: Coupled with a well-designed user interface, Microarray ft US leverages cutting edge Bioconductor 
packages for researchers with no knowledge in R language. It also enables a more reliable and accurate microarray 
data analysis and expedites downstream functional analysis of microarray results. 
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Findings 

Background 

Microarray technology has been widely used for global 
gene expression profiling. Based on the major public 
microarray data repositories such as NCBI GEO (http:// 
www.ncbi.nlm.nih.gov/geo/) and ArrayExpress (http:// 
www.ebi.ac.uk/arrayexpress/), the overwhelming majority 
of microarray studies were performed on Affymetrix Gen- 
eChips and lllumina BeadArrays, with human, mouse and 
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rat being the most common model organisms. Finding 
differentially expressed genes (DEG) under various experi- 
mental conditions is the primary goal of these studies. 

With hundreds of published packages, the R-based stat- 
istical platform Bioconductor [1] is a major solution for 
microarray data analysis. However, the command-line 
driven Bioconductor and its packages may prove to be in- 
convenient to use for experienced users dealing with 
multiple-step analysis, and virtually inaccessible for users 
with no solid knowledge about R. Graphical user inter- 
faces (GUI) for Bioconductor packages have been devel- 
oped to enable biology researchers to use cutting- edge 
algorithms without the need of learning R, notably, 
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affylmGUI for Affymetrix data analysis [2] and oneChan- 
nelGUI for both Affymetrix and Illumina data analysis [3]. 
Web-based software such as WebArray [4] and CARMA- 
web [5] were also developed to offer GUI to Bioconductor 
packages. Many of these software have been actively 
expanded since their initial releases, but as more and 
more functionalities are added, they are increasingly cum- 
bersome to learn and use, especially for those who are 
mostly interested in differential expression analysis. 

A shortfall common to all these software is that they 
generally do not systematically address the probe design 
problems associated with many microarray chips. The 
Affymetrix 3TVT GeneChips have been the most popular 
platform for global gene expression analysis in the past 
decade. They consist of probe sets containing 11-20 pairs 
of 25mer probe targeting a gene or transcript. While 
designed with the most complete information available at 
the time, the tremendous progress in genome sequencing 
and annotation in recent years has rendered an increasing 
number of existing probe sets outdated. Several studies 
[6-8] indicated that a substantial percentage (30-70%) of 
Affymetrix probe sets contain at least one probe that is ei- 
ther non-unique, no-target, mis-targeted, or overlapping 
with known SNPs in the central region. Besides adding 
noise, these problematic probes also affect the accuracy of 
expression value estimation [9,10 and references therein]. 
Furthermore, these studies have also shown that the anno- 
tations of a significant portion of the probe sets are either 
outdated or incorrect based on the latest genomic know- 
ledge. The updated Affymetrix probe set annotation not 
only benefits our understanding of microarray results, but 
also improves the cross-platform reproducibility of micro- 
array experiments [11,12]. While the chip design philoso- 
phy is different from Affymetrixs GeneChips, Illuminas 
BeadArrays have similar problems in terms of problematic 
probes and outdated probe annotations [13,14]. 

Implementation 

Written in R, Microarray H US was a cleanly designed 
GUI specifically for users who have no knowledge in R. 
It provides a streamlined workflow for analyzing expres- 
sion microarray data (Figure 1). The program console 
consists of a top menu bar, as well as a Work Flow Log 
and a Task Status to allow users to easily perform and 
track the status of their data analysis (Figure 2). For in- 
formation on all the Bioconductor packages implemen- 
ted in this software as well as their publications, refer to 
Additional file 1. List of the implemented Bioconductor 
packages. 

Results and discussion 

Microarray H US was developed to not only provide a 
simple and streamlined workflow to researchers who are 
mainly interested in a fast differential gene expression 



analysis, but also to improve the accuracy and reliability 
of the analysis, as well as expedite downstream func- 
tional analysis of the microarray results. In addition to 
many carefully planned design characteristics aimed at 
enhancing its usability, Microarray % US provides the 
following unique features: 

Support custom chip description files (CDF) for major 
3'IVT affymetrix GeneChips and probe re-annotation for 
major affymetrix GeneChips and illumina BeadArrays 

To enable researchers to take advantage of the latest re- 
search on probe set re-definition and re-annotation, we 
implemented the custom CDF and probe set re- 
annotation by Dai et al. (2005) and by Risueno et al. 
(2010) for Affymetrix GeneChips and probe re- 
annotation by Du et al. (2008) and Barbosa-Morais et al. 
(2009) for Illumina BeadArrays. To mitigate the undesir- 
able consequences that arose from the aforementioned 
microarray probe design problems, Dai et al. (2005) and 
Risueno et al. (2010) used the latest genome/transcrip- 
tome sequences to perform strict probe re-alignment 
and mapping and discarded 30-60% of the original 
Affymetrix probes that were problematic. The remaining 
probes were re-defined into new probe sets (in the form 
of custom CDF) and re-annotated with the latest gen- 
omic annotation. An independent evaluation of Dai 
et al. s study concluded that the updated probe set defi- 
nitions resulted in significant improvement of both 
precision and accuracy of expression level analysis [15]. 
For Illumina arrays, Du et al. (2008) eliminated up to 
30% of original probes without a unique and perfect 
match to a single Entrez gene by mapping probe 
sequences against the latest corresponding RefSeq 
sequences and re-annotated the remaining probes with 
the latest genomic annotations. Barbosa-Morais et al. 
(2009) also re-defined the probes against the latest gen- 
ome and transcriptome but used less strict rules for 
excluding uninformative probes. 

To our knowledge, Microarray H US is the only micro- 
array software that implements multiple custom CDF 
(Affymetrix) and probe set/probe re-annotation (Affymetrix 
and Illumina) for a more reliable gene expression analysis. 

Quick generation of input files for comprehensive 
functional analysis of microarray results 

The statistical analysis of microarray raw data often results 
in lists of hundreds of DEG. Understanding the underlying 
mechanisms and functional ramification of such expres- 
sion changes is becoming the most important and 
daunting task of 'Omics research. In the last decade, sev- 
eral hundred bioinformatics tools have been developed for 
biological interpretation of large gene lists at a systems 
biology level [reviewed in 16]. A comprehensive functional 
analysis of microarray results commonly requires the use 
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Figure 1 Typical Microarray fl US workflow. Microarray fl US provides a streamlined workflow for a typical differential gene expression analysis 
task. 



of multiple tools, as they differ in underlying statistical 
methods, annotation contents and analytical capabilities 
[16,17]. Different tools usually require different types/for- 
mat of input files, and manually converting microarray 
results into these files is a very laborious task. To expedite 



comprehensive functional analysis, we implemented a 
results output utility tool that can instantly generate input 
files for some 20 of the most widely used commercial and 
open access functional analysis software (see Additional 
file 2. List of the supported functional analysis software). 



76 Microarray_R_US 1.0 



Project [ Data Import ] Data Preprocessing Quality Control Differential Expression Analysis Power Analysis Results Output Help • 
Optional: Download Public Data 



Navigation Bar 



Stepl: Import Raw Data 
Step2: Import Design File 
Optional: Inspect Design File 



Welcome to Microarray R US 



Work Flow Log: 

Project Path : 
Project Name : 
Data Type : 
Raw Data : 
Chip Type : 
Name of Design File : 
Design File : 
CDF Used: 
Preprocessed Data : 
Data Annotation : 
Quality Control Output : 
Differential expression Output : 
Gene List Output : 



Task Status: 

Task Completed: 
Task to be Completed: 



C:AIicroarrayRUS_testResults/testHlumina 
testDTumina 

lllumina Data(non-preprocessed) 

Import Completed 

Mouse 

FrenkelBoneNewExportDesign-csv 

Import Completed 

lllumina 

Done 

Done 

Done 

Limma.AdvancedANOVA 
Done 



The list of 17 Genes has been Generated! 

You could check the Gene list and Export the Result 



Figure 2 Microarray fl US console. Microarray US features a linear step-wise workflow for analyzing microarray raw data. When using the 
Microarray fl US, users can simply follow the workflow by going through the Navigation Bar from left to right. Major analysis steps are also clearly 
marked in the Task Status section. Task to be Completed directs users to the next task in the workflow. 
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To our knowledge, Microarray H US is the only microarray 
software that provides such time-saving functionality. 

Microarray fl US key functionalities 
Data import 

Microarray >I US supports major 31 VT Affymetrix Gen- 
eChips and Illumina BeadArrays for human, mouse and 
rat (see Additional file 3. List of the supported micro- 
array data types). In addition to user data, public data 
from GEO and ArrayExpress can be directly downloaded 
within the software via the implementation of GEOqu- 
ery, GEOmetadb, and ArrayExpress. 

Custom CDF and probe re-annotation selections 

For Affymetrix GeneChips, users are given choices of 
the original manufacturer and custom CDF [6, 7, Brai- 
narray version 13], along with the corresponding probe 
set annotations. For Illumina BeadArrays, original manu- 
facturer annotation and two custom re-annotation 
[13,14] are available. 

Data preprocessing 

For Affymetrix data, Microarray H US offers several 
commonly used algorithms as implemented in RMA, 
gcRMA, MAS5 and dChip packages. An advanced 
option is also provided to allow users to select 
methods for background correction, PM correction, 
normalization, and probe set summarization. For 
Illumina data, the software accepts preprocessed data 
output from GenomeStudio and supports fully 



customizable preprocessing with lumi package for 
non-preprocessed data. 

Quality control and exploratory analysis 

For Affymetrix data, Microarray >I US implemented 
Array Quality Metrics and QCreport For Illumina data, the 
quality control method implemented in the lumi package 
is supported. 

For exploratory data analysis, Microarray H US supports 
both Principle Component Analysis and hierarchical clus- 
tering analysis via the implementation of made4 and stats 
packages. 

Differential expression analysis 

Four widely used statistical packages are implemented, in- 
cluding Linear Model for Microarray Data (limma, with 
advanced options for multiple fixed and random factors), 
Significance Analysis of Microarrays (SAM, both paired 
and unpaired), Rank Product Test (RankProd), and 
maSigPro (time course data). 

Power analysis 

Power analysis on sample size and detection efficiency 
for p value or fold changes are supported in Microarray 
H US via the implementation of ssize package. 

Results output 

With easy to follow dialog windows, users can out- 
put a full table of statistical results or DEG lists. 
Visualizations of DEG lists with heatmap or Venn 
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*?& Generate input files for functional analysis software 



Stepl: Input files for gene lists output utility 



Stepl: Input files for gene Usts output utility 

Please Select Complete Gene List 

Complete file | 



'.DEGcomplete.txt 
Please Select Filtered Gene List 
'.DEG.txt 

Please Select Preprocessed file of Gene List 
\Prep.txt 



Prepro 



;s File 



Reqirement of the input for erichment analysis software 

1. Complete gene list is required for the following softwares 

GeneTrail GSEA-P GenMAPP 2 Gorilla 

2. Filtered gene list is required for the following softwares 

IPA NextBio NIH DAVID GeneTrail GeneCodis 
WebGestalt FatiG0+ ToppCluster GSEA-P TransFind 

TFacts Onto-tools Pathway-Express Connectivity Map FuncAssociate 2.0 GoMiner 
MAGIA GeneSet2miRNA 

3. Preprocessed file of Gene List is required for the following softwares 
GSEA-P EXALT MM1A GenePattern 



Step2: Select types for functional analysis software 



Listing of erichment analysis software 

A. Commercial Functional Profiling Software 

r IPA r NextBio 

B. Comprehensive Functional Profiling Software 

T NIH DAVID T GeneTrail F GeneCodis F 
F FatiGO* r ToppCluster T GSEA-P 

C. Transcription Factors Targets Analysis Software 
F TransFind F TFacts 

D. Pathway Only Analysis Software 

F Onto-tools Pathway-Express F GenMAPP 2 

E. Expression Gene Signatures Search Software 
r EXALT r Connectivity Map 

F. Gene Ontology (GO) Only Analysis Tool 

F Gorilla F FuncAssociate 2.0 F GoMiner 

G. miRNA and mRNA Integrated Analysis 

r MAGIA F MMIA F GeneSet2miRNA 

H. Other Tools 
F GenePattern 



Figure 3 Result Output Utility Dialog Windows. The Result Output Utility Tool of Microarray fl US exports microarray results into input files for 
over 20 commonly used function analysis software with corresponding formats. This function can also be used for converting results generated 
with other microarray analysis software. 
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Name 


Type 


§j SAM.Unpaired_Age_P7_vs_Pl_.EXALT.txt 


Text Document 


SAM.Unpaired_Age_P7_vs_Pl_.6enePattern.cls 


CLSFile 


__j SAM.Unpaired_Age_P7_vs_Pl_.GenePattern.gct 


GCT File 


H SAM.Unpaired_Age_P7_vs_Pl_.GeneTrail-GSEA.txt 


Text Document 


__, SAM.Unpaired_Age_P7_vs_Pl_.G0rilla.txt 


Text Document 


___ SAM.Unpaired_Age_P7_vs_Pl_.GSEA.cls 


CLSFile 


@ SAM.Unpaired_Age_P7_vs_Pl_.GSEA.gct 


GCT File 


@ SAM.Unpaired_Age_P7_vs_Pl_.GSEA.rnk 


RNK File 


SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.DAVID.txt 


Text Document 


£ SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.DOWN_CMAP.grp 


Microsoft Program Group 


_ SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.DOWN_TFactS.txt 


Text Document 


^ SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.FatiGO.txt 


Text Document 


Q SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.FuncAssociate.txt 


Text Document 


H SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.GeneCodis.txt 


Text Document 


H SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.GeneSet2miRNA.txt 


Text Document 


U SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.GeneTrail-SEA.txt 


Text Document 


H SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.GoMiner.txt 


Text Document 


H SAM.Unpaired_Age_P7_vs_Pl_FC_2_AdjP_0.05.IPA.txt 


Text Document 


H SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.NextBio.txt 


Text Document 


_ SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.Onto-PE.txt 


Text Document 


__, SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.ToppCluster.txt 


Text Document 


[_j SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.TransFind.txt 


Text Document 


£ SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.UP_CMAP.grp 


Microsoft Program Group 


□ SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.UP_TFactS.txt 


Text Document 


H SAM.Unpaired_Age_P7_vs_Pl__FC_2_AdjP_0.05.WebGestalt.txt 


Text Document 



Figure 4 Examples of output results files. The statistical methods, experimental factor and the functional analysis software name are 
automatically embedded in the names of output files. The output files can then be directly imported into the corresponding functional analysis 
software. 



diagrams are also available via the implementation of 
gplots and limma packages. 

The Gene List Output Utility can be used to in- 
stantly convert microarray results into input files for 
over 20 functional analysis software (Figure 3. 
Screenshot of the dialog windows for generating in- 
put files for functional analysis tools). It can also be 
used for microarray results generated from third- 
party microarray software with minimal reformatting. 
A carefully-thought default file naming schema was 
implemented to allow users to easily locate output 
files for each selected functional analysis tool 
(Figure 4. Examples of output results files for down- 
stream functional analysis). 

Conclusion 

A GUI to over a dozen widely used Bioconductor 
packages with enhanced usability, Microarray H US 



provides a streamlined workflow for routine differential 
gene expression analysis based on Affymetrix and Alumina 
chips for users with no knowledge in R language. With its 
unique implementation of several up-to-date Affymetrix 
custom CDF and probe set re- annotations for both Affy- 
metrix and Alumina platforms, this tool facilitates a more 
accurate and precise microarray data analysis. The versa- 
tile results output utility tool enables a fast and easy gen- 
eration of input files for over 20 of the most popular 
functional analysis software programs. 

Availability and requirements 

Microarray H US is available for Windows (both 32 and 
64 bit), Mac OS, and Linux/Unix under the Open GPL 
license at http://norris.usc.libguides.com/MicroarrayRUS 
(free registration required). 

Periodic update of the custom CDF will be made when 
the major revisions become available. 
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Additional file 1: List of the implemented Bioconductor packages. 

Complete list of the implemented Bioconductor packages with brief 
descriptions and references. 

Additional file 2: List of the supported functional analysis software. 

Description: Complete list of the supported functional analysis software 
for the Gene List Output Utility Tool. Access information, methods, input 
file requirements, supported organisms, matching Microarray fl US output 
file, and other details are listed for each supported software. 

Additional file 3: List of the supported microarray data types. 

Complete list of the supported microarray chips. 
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